Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing

نویسندگان

Zhenghua Li

Min Zhang

Wenliang Chen

چکیده

This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training. Instead of only using 1best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple 1-best parse trees generated from diverse parsers on unlabeled data. With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings. This framework offers two promising advantages. 1) ambiguity encoded in parse forests compromises noise in 1-best parse trees. During training, the parser is aware of these ambiguous structures, and has the flexibility to distribute probability mass to its preferred parse trees as long as the likelihood improves. 2) diverse syntactic structures produced by different parsers can be naturally compiled into forest, offering complementary strength to our single-view parser. Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indonesian Dependency Treebank: Annotation and Parsing

We introduce and describe ongoing work in our Indonesian dependency treebank. We described characteristics of the source data as well as describe our annotation guidelines for creating the dependency structures. Reported within are the results from the start of the Indonesian dependency treebank. We also show ensemble dependency parsing and self training approaches applicable to under-resourced...

متن کامل

Title of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing

In this thesis, I present three supervised and one semi-supervised machine learning approach for improving statistical natural language dependency parsing. I first introduce a generative approach that uses a strictly lexicalised parsing model where all the parameters are based on words, without using any part-of-speech (POS) tags or grammatical categories. Then I present an improved large margi...

متن کامل

Semi-Supervised Convex Training for Dependency Parsing

We present a novel semi-supervised training algorithm for learning dependency parsers. By combining a supervised large margin loss with an unsupervised least squares loss, a discriminative, convex, semi-supervised learning algorithm can be obtained that is applicable to large-scale problems. To demonstrate the benefits of this approach, we apply the technique to learning dependency parsers from...

متن کامل

Simple Semi-supervised Dependency Parsing

We present a simple and effective semisupervised method for training dependency parsers. We focus on the problem of lexical representation, introducing features that incorporate word clusters derived from a large unannotated corpus. We demonstrate the effectiveness of our approach in a series of dependency parsing experiments on the Penn Treebank, and we show that our clusterbased features yiel...

متن کامل

Working with a small dataset - semi-supervised dependency parsing for Irish

We present a number of semi-supervised parsing experiments on the Irish language carried out using a small seed set of manually parsed trees and a larger, yet still relatively small, set of unlabelled sentences. We take two popular dependency parsers – one graph-based and one transition-based – and compare results for both. Results show that using semisupervised learning in the form of self-tra...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing

نویسندگان

چکیده

منابع مشابه

Indonesian Dependency Treebank: Annotation and Parsing

Title of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing

Semi-Supervised Convex Training for Dependency Parsing

Simple Semi-supervised Dependency Parsing

Working with a small dataset - semi-supervised dependency parsing for Irish

عنوان ژورنال:

اشتراک گذاری